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ABSTRACT 



An adaptive recursive digital filter is presented in 
which feedback and feedforward gains are adjusted adaptively 
to minimize a least square performance function on a sliding win- 
dow averaging process. A two-dimensional version of the adap- 
tive filter is developed and its performance compared with 
the optimal Wiener filter. The filter is shown to be effec- 
tive in separating three diagonal trajectory streaks from a 
background of correlated noise added to white noise. Although 
the recursive adaptive filter approaches the optimal Wiener 
filter in performance, it does not require a priori statistical 
knowledge as does the Wiener filter to which it is compared. 

The results indicate that the recursive adaptive filter "learns" 
the statistics and adapts. 
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I. 



INTRODUCTION 



The term "filter" is often applied to any device or 
system that processes incoming signals or other data in such a 
way as to eliminate noise, to smooth signals, to identify each 
signal as belonging to a particular class, or to predict the 
input signal from instant to instant. There is an abundance 
of literature covering the theories involved under the head- 
ings of estimation, identification, modeling, prediction, etc. 
The usual method of estimating a signal corrupted by noise is 
to pass it through a filter that tends to suppress the noise 
while leaving the signal relatively unchanged. The design of 
' such filters falls in the domain of optimal filtering, which 
originated with the pioneering work of Wiener [8] and was ex- 
tended and enhanced by the work of Kalman-Bucy [9] and others. 

Filters used for the above purpose can be fixed or adap- 
tive. The design of a fixed filter is based on a priori know- 
ledge of both signal and noise statistics. On the other hand, 
adaptive filters have the ability to adjust their own para- 
meters automatically, and their design requires little or 
no a priori knowledge of signal or noise characteristics. 

This work presents an approach to signal filtering using an 
adaptive filter that is in some sense self-designing . 

The adaptive filter described here bases its own "design" (its 
adjustment of internal parameters) upon estimated (measured) sta- 
tistical characteristics of input and output signals. 



7 



The statistics are not measured explicitly and then used 
to design the filter; rather, the filter design is accomplished 
in a single process by a recursive algorithm that automatically 
updates the system parameters with the arrival of each new data 
sample. It is assumed that the input and the output at the 
sampling instants are the only measurable quantities of the 
system. It is also assumed that the unknown filter coefficients 
parameters) to be designed enter linearly in the difference 
equations which describes the self-designing process. 

The steepest descent method is employed in which the pre- 
vailing filter parameter vectors are perturbed at each itera- 
tion in a manner so as to decrease a prescribed functional 
(error criterion or cost function) to be minimized. The steep- 
est descent method is one of the well known gradient based 
algorithms . 

For the case where the functional being minimized is the 
mean square error, where the error is the difference between 
filter output signal and the desired signal, the filter is 
called the least mean square filter (LMS filter) . Various 
adaptive algorithms are currently available depending upon 
the cost function and the method used to minimize cost function. 

The popularly used performance criteria are the least mean 
square criterion, the maximum likelihood ratio (MLR) criterion, 
and the maximum signal to noise ratio (SNR) criterion. Here 
the LMS criterion only is studied and the steepest descent 



method is employed. Inevitable errors in the estimation of 
the statistics prevent the adaptive filter from delivering 
optimal performance since the adaptive filter is not based on 
the a priori knowledge of statistics. In Chapter II, the con- 
cept of linear stochastic processes is reviewed as a preliminary 
study for this thesis, and the modeling of stochastic processes 
is studied. These can be considered as background material 
for the following chapters. 

In Chapter III, the concept of adaptive filters is intro- 
duced and the structure of the signal and the mathematical 
model of the processor is delivered. The algorithm for the non- 
recursive adaptive filter by Widrow [1] is reviewed and the new 
algorithm for the recursive adaptive filter is developed as is 
the two-dimensional adaptive filtering process. 

In Chapter IV, the adaptive noise cancelling concept is 
analyzed rather qualitatively and its application to the special 
case in which no desired signal is available is analyzed. In 
Chapter V an experiment is performed through computer simulation 
to check the feasibility of algorithms developed in the pre- 
vious chapter and a comparison with the optimal Wiener solution 
is made. In Chapter VI, the conclusions are presented together 
with a summary of the experimental results and suggestions for 
further research. 
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II. LINEAR STOCHASTIC PROCESSES 



A. INTRODUCTION 

The problem of defining a random process is of considerable 
importance in the analysis of systems subject to noise distur- 
bance. Often a partial definition of the process will suffice 
as in the case of linear least mean square error filtering., 
where only a knowledge of the correlation function is required. 
For other problems, such as those involving nonlinear filtering, 
more complete information will generally be needed. A complete 
description of a random process requires a knowledge of the 
distribution functions of all orders. But in practice few 
processes apart from the normal and Markov are defined in this 
manner. For the purpose of analysis, a model to generate the 
random process is desirable and for a model to give a complete 
description of the process, the distribution functions should be 
derivable from the model. While both continuous and discrete- 
time linear process may be defined, only the discrete -time case 
will be considered here. The discrete -time linear process 
can be considered to be the result of the digital filtering of 
a sequence of independent and identically distributed (IID) 
random variables. 

The linear processes are important since they are inherently 
simple in terms of physical considerations and form a class 
which includes many discrete time normal random processes. 

In the following section the definition and properties of the 
linear processes are summarized. 



B. DEFINITION AND PROPERTIES OF LINEAR STOCHASTIC PROCESSES 

It has been found useful in the theory of stochastic processes 
to divide stochastic processes into two broad classes: sta- 

tionary and nonstationary. Intuitively, a stationary process 
is one whose distribution remains the same as time progresses, 
because the random mechanism producing the process is not chang- 
ing as time progresses. A nonstationary process is one which 
is not stationary. 

Let {x(i), ieT} be a stochastic process with finite second 
moments. Its mean value sequence , denoted by m(i), is defined 
for all i in T by 

m(i) = E [ x ( i ) ] (2-1) 

and its covariance kernel, denoted by K (j.i), is defined 
for all j and i in T by 

K (j.i) = Cov [ x (j) , x (i)] (2-2) 

An index set T is said to be a linear index set if it has the 
property that the sum j of any two numbers i and j of T also 
belongs to T. Examples of such index sets are T= {1^2 , . . .}, 

T = {o,±l, ±2, . . . .} , T={i;i^o} and T={ i : -°° <i< °° } 

A stochastic process {x(i), ieT }, whose index set T is linear, 
is said to be 

i) strictly stationary of order k, where k is a given positive 
integer, if any k points i, i+1, ...i + kin T + , where 
T + - {x (i) , i^o}, and any j in T + , the k dimensional random 
vectors {x(i), x (i+1). . x (i+k) }and{ x(itj),. . . x(i+j+k)} 
are identically distributed. 



ii) strictly stationary if for any integer k, it is 
strictly stationary of order k. 

iii) widesense stationary (covariance stationary) if it 
possesses finite second moments, if its index set T is linear, 
and if its covariance kernel K (j ,i) is a function only of the 
absolute difference |j-i|, in the sense that there exists a 
function R(n) such that for all j and i in T + 

K (j ,i) = Cov [x ( j ) , x ( i) ] = R xx ( j~i) (2-3) 

or more precisely, R^fm) has the property for every i and j in Z + 

Cov [x (i) ,x (i+m) ] = E [x (i) x (i+m) ] 

= R xx (m) (2-4) 

We call R^y^m) the covariance function (autocorrelation function) 
of wide sense stationary time series (x(i), ieT + } . 

The second problem concerns the concepts of ergodicity 
and the strong law of large numbers in terms of linear processes. 
To present a complete discussion of this question is not rea- 
sonable for review purposes, but it is interesting to con- 

sider certain aspects of it. For strict sense stationary pro- 
cesses, the ergodic theorem is the strong law of large numbers 
and states that 

if (x (i) , ieT + } is a strict sense stationary, ergodic 

random process and E [ | x (o) | ] <°°, 
m 

then lim 1 E x (i) =E [x (o) ] with probability 1. (2-5) 

m i=l 
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In general, a stochastic process is said to be ergodic if it 
has the property that sample (or time) averages formed from 
an observed sequence of the process may be used as an approxima- 
tion to the corresponding ensemble average. 

Stationafity and ergodicity concepts are readily extended to 
two-dimensional random fielcfe (for two dimensional signal, the 
term random field is preferred to random process) . 

The assumption that the field is stationary means that 
the statistics of a point in the field are not dependent on the 
location of the point. Then, a stationary, two-dimensional 

field has an autocorrelation function defined as: 



R xx (m,n)A E{x (k, J2.) x (k+m , #+n) } (2-6) 

and it is also said to be ergodic if the statistical (ensemble) 
average of random field x(k,l) is equal to the spatial averag- 
ing of all points. That means 

E[x(k,JO] = <x> (2-7) 



where <x> by definition represents spatial averaging 



<x>= 



lim 






2 ->oo 



M, 



_1_ MiMi 

M Z Z 
M 2 o o 



X 



(k,I) 



(2-8) 



Now, consider a stationary sequence of random variables 
{x(i) , ieT}. The correlation function of the sequence may 
be written in the form 






r TT 




iwn 

e 



dF (w) 



(2-9) 
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where F(w) is a nondecreasing function, called the spectral 

/ 

distribution. If F(w) is absolutely continuous with F (w) = 
f(w) almost everywhere, we may write 



R xx (n) 




iwn 

e f (w) d«7 



( 2 - 10 ) 



00 



Under certain conditions, (e.g. E |R(n)|<°o, finite), the 

n=0 

correlation function may be inverted to yield the spectral 
density as 

f(w)= 1 E R xx ( n )e~ inW (2-11) 

2t t -°° 

A sequence of random variables(x (i) , ieT} is said to be a 
process of moving average if it admits the mean square repre- 
sentation 



x (i) =_E h(i-j) u(j) (2-12) 

where(u (j) , jeT} is a collection of orthornomal random 
variables (sequence of white noise) . If the sequence (h(i) , 
ieT} is one sided (ie, h(i) = 0, i<0) , then (x(i), ieT} is 
called a one sided moving average process. 

A process is said to be regular if the error for prediction 
one time unit ahead is nonzero. It is known that a process is 
one of moving averages if and only if its spectral distribution 
function is absolutely continuous. Furthermore, a process with 
an absolutely continuous spectral distribution function is 
regular if its moving average representation is one sided. These 
facts serve to motivate the following definition of a linear 



process . 



DEFINITION [5] 

A linear process {x(i),ieT} is one having the structure 

09 j 

x(j) = Eh (i) u ( j-i) = £h( j-i)u(i) , (2-13) 

o 

where {u(i), ieT} is a sequence of independent and identically 

distributed (IID) random variables. The set of real constants 

00 

(h(i) , ieT+} is such that £|h(i)| <«>, and the function 

o 

oo 

H(z) = Z h(i) z -i where z is a complex variable, is analytic 
o 

and has no poles outside the unit circle in the z plane. The 
correlation function of this process is given by 



^ ) = l h(j)h(j+n) (2-14) 

and the corresponding spectral density f (w) is 



f (w) - i I? h(j)e ^ w | 2 = ij H(e iw ) I 2 



(2-15) 

It is assumed further that the process (2-13) has a rational 



2 tt '° 



spectral density, that is 



f (w) = i | H(e lw ) | 2 = 1 
27T “tt 



_ / lw \ 
B (e ) 



A (e lw ) 



(2-16) 



where both A(e 1W ) and B(e iw ) are polynomials in e iw of 
finite order with all their poles inside the unit circle. 

The process (2-13) may be generated by passing the sequence 
{u(i), ieT} through the digital filter H(z). By the assumption 
in the definition of the linear process and by the restrictions 
on the spectrum, there exists an inversion D(z) where 



In general, D(z) and H(z) may be infinite polynomial in z. 
D(z) may be written as the one-sided sequence 



D ( z ) = g d i z' i (2-17) 

Passing the x(i) sequences through the digital filter will 
recover the generating sequence u(i), that is 

7 00 

u(j) =_Z dtj- j)x(i)= g d(i)xcj-i) (2-18) 

This is called an operation inversion. In general, we will have, 

assuming a 0 = 1 

n _i 
Zb • z 

H ( z ) = S---- - -r- = z h ( i) z 1 (2-19) 

. , m -l 

1 + - a . z o 

T 

and the process{x ( i) , ieT} may be represented in two ways 



i) x ( j ) = g h (i) u ( j-i) 



( 2 - 20 ) 



m 1 n 

ii) x ( j ) = 2 biu(j-i)- Zaix(j-i) 
o 1 

The second representation indicates that if H(z) is an all- 

pole function, then we have 
m 

iii) £ a^x (j-i) = u(j). (2-21) 

o 

In this case, since inversion uses only a finite number of past 
samples, the process is called "finitely invertable." It is 
clear that finitely invertable linear processes form a subclass 
of autoregressive schemes for which case the set {u(i)} in 
(2-21) would be orthonomal rather than independent. The con- 
cepts and definitions above can be readily extended to a 



two-dimensional linear process. A two-dimensional linear 
process {x(m,n), We Zj, nez 2 } could have the structure 

oooo 

x(m,n) = ZZh(k,g ) u (k-m,$-n) 
oo 
m n 

= ZZ h(k-m,jl-n)u(k, 1) , (2-22) 

OO 00 

where (u(k,SL) f kez^, <lez 2 } is a two-dimensional sequence of 
IID random variables with zero mean and unit variance. The set 
of real constant (h(k.SL), kez^ + , i.ez 2 + } is such that 

COCO , rt 

ZZ | h(k, #) | <<» , and the function H(Z 1 ,Z 2 ) = ggh(k,i) Z 1 ~z 2 ”\ 

where Z-^ and Z 2 are complex variables, is analytic. The correla- 
tion function of this process is given by 



OOOO a A 

R vv (m,n) = ZZh(k, jb h(k+m,x+n) (2-23) 

o o 

and the corresponding spectral density f(w^,w 2 ) is 



f (w 1 ,w 2 ) 




h(k,l)e iw l k e iw21 



H ( e iw l , e 1 " 2 ) 



(2-24) 



With the same reasoning as in the one -dimens ional case, we have 
i n general , 



H ( Z^ , Z 2 ) 




o o ij 1 2 

M, M 2 

1+Z Z aij Z 1 1 Z 2 ~ J 

i=o j=o 
(i, j)^(o,o) 



00 00 



z z 



0 0 



h(k,£) 




(2-25) 



and the process (x(k,l), keZ^, leZ 2 )niay be represented in two 
ways 



It should be noted that the moving average scheme would be ac- 
complished by passing the orthonomal IID random variables .through 
the nonrecursive filter and the autoregressive schemes through 
the recursive filter for both one-dimensional and two-dimensional 
random process. 

In the study of systems subject to the random signals, the 
concept of power spectral density function is of importance. 

For a given transfer function of a linear filter, the cross power 
spectrum between input and output of the filter, and the output 
power spectrum is of primary concern. 

Consider a (continuous) system subjected to a random input 
signal. Given a linear system with a transfer function 
H(jw), the input to the filter a stationary process x.(t) 
with an autocorrelation function R (z) and a power spectral 
density function G (w) , then the following relationships 

XX 

are obtained 



i) x(k,j l) = ??h(i. j)u(k-i,l-j) 




(i ,j)^(o ,o ) 



G xy (w) " G xx (w)H * ( 3 w ) 
G yy (w) = G xy (w)H(jw) 



( 2 - 27 ) 



Combining above two equations, 



G yy (w) = G xx (w) | H(jw) | 2 , 



( 2 - 28 ) 
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output power spectrum, the cross correlation would be cal- 



For the continuous two-dimensional linear system of 
H(jw 1# jw 2 ), subjected to a stationary two-dimensional input 
signal of power spectrum G xx (w^, w 2 ) . 



Again combining above two relationships, the output power 
spectral density function is 



Consider a discrete linear system to which a random input se- 
quence is applied with a transfer function H(z). The input 
sequence is a stationary process x(i) with an autocorrelation 

function R (m) and its 7-transform G (z) , where G (z) is equi 
xx -J xx xx ^ 

valent to the power spectral density in the continuous case,i.e 






(2-29) 




(2-30) 




(2-31) 



Then 



G xy (2) = G xx ,2) H(2) 

G ( 2 ) = G xx (z) H(z)H(z _1 ) 



(2-32) 



For the two-dimensional discrete case 
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As a summary of this section, a discrete-time linear process can be 
considered to be the result of digital filtering an indepen- 

dent identical random sequence having zero mean and unit variance. 
The moving average scheme is the result of filtering through the 
nonrecursive filter and the autoregressive scheme is that of 
filtering through the recursive filter. 

And for a linear system, the relations between transfer 
function, power spectrum, and auto correlation are given by 



G 

G 

R 



yy " G xx l H l 



= G H 
xy xx 



(autocorrelation function) 



transform of power 
spectral density function. 



The Figure 2-1 shows the block diagram which describes 
the various relations and concepts. 
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FIGURE 2 -1 
STATISTICAL 

PROPERTIES OF LINEAR SYSTEM 
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C. MODELING OF LINEAR STOCHASTIC PROCESS 



1 . Introduction 

A more active concern at this time is that of system 
modeling. It has been shown in a previous section that a linear 
stochastic process (or field) can be generated by filtering 
white noise through a linear filter. The problem can be stated 
as follows. What is the filter equation (difference equation) 
that produces a typical random process with a specif Jed autocorrelation 
function? That is, with the knowledge of second -order statistics, 
determine the filter coefficient a's and b's in equations (2-20) 
and (2-26) ; It is clear that if one is successful in developing 
a parametric model for the behavior of some random process, 
then the model can be used for different applications, such as 
prediction, estimation, smoothing, etc.. As far as the general 
modeling problem goes, one of the most powerful models currently 
in use is that where a signal x(n) is considered to be the out- 
put of some system (filter) with unknown input u(n) such that 

the following relationship holds 
P q 

x (n) = - Z ^ k x(n-k) + b^ufn-i), b 0 =l (2-34) 

k= 1 

where a^, 1<_ k<_ p , hi 1 £££<2 

andtbe gain G are the parameters of the hypothesized system. 
Equation (2-23) says that the "output" x(n) is a linear function 
of past outputs and present and past inputs. That is, the 
signal x(n) is predictable from linear combinations of past 
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outputs and inputs . For ihe two-dimensional case, the difference 
equation corresponding to equation (2-34) would be 



Mj M 2 Lj L 2 

x(k,£) = - E E a^ .x (k-i, £-j ) +G E E b ^ 2 u (k -£ 1 , l-l^) 

i=o j=o ^ £^=u £ 2 =u * 

where (i,j)^(0,0) (2-35) 

Equations (2-34)axl(2-35) can also be specified in the frequency 
domain by taking the z transform of both sides of Eq (2-34) and Eq 
(2-35) . 



H ( z) 



q -£ 

X(Z) ^ 1+ £=l b £ Z 
U(Z) p 

1+ E a. Z K 



(2-36) 



CO 

where X(Z) = E x(n)Z n is the Z transform of x(n), and U(Z) 
n= — 00 

is the z transform of u(n). 

For the two-dimensional case, 



H(Z ] _ 



X(Z, ,Z 9 ) 
Z 2 )= — - =G 

u(z 1 ,z ? ) 




i l =0Z 2 =0 



M 1 M 2 -i 

1 + . Z 2 a ij Z l 

i=o ]=o J 

(i. j)^(o.o) 




(2-37) 



where X(Z^,Z 2 ) 

u(z 1 ,z 2 ) 

H(Z)andI(Z 1 ,Z 2 ) 



= 3tx(k,£)] 

= 3[u(k,£) ] 

in Eqs (2-36)ard(2-37) are the general pole zero 



models. 
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The roots of the numerator and denominator polynomials are the 
zeros and poles of the model, respectively. There are two 
special cases of the model that are of interest, 
i) all- zero model ^k = 0, ^-ij = 0 
ii) all- pole model b^= 0, bZ^Z 2 = ® 

1 £ l <_ q 

0 £ L 1 

0 <_ % 2 < l 2 

But U 1 , l 2 )? (0,0) 

As mentioned in section II-B, the all- zero model is known 
in the statistical literature as the moving average (MA) model 
and the all -pole model is then known as the autoregressive (AR) 
model. The pole-zero model is then known as the autoregressive 
moving average (ARMA) model. It should be recalled here that we 
are interested in the case where u(n) or u(k T !) is white noise, and 
this will be treated as a special case in the following. The 
modeling problem can be stated as"given signal x(n) or x(k,2j, 
find the filter coefficients (a's and b's)and the gain, G, in 
Equation (2-34) in some manner." Two approaches will be given for 
a solution of the above problem. The first is the method of least 
squareswhich is based on the optimal estimation concept, and the 
second isthefilter response method in which linear system pro- 
perties are used. The one-dimensional case will be treated 
first, then two-dimensional case including some examples. For 
example, a lowpass correlated random process (field) and band 
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limited random process (field) are chosen since they represent 
practical examples. 

2 . The Method of Least Squares 

ilthoughthe following can be applied to the deterministic 
signal and stochastic processes, stationary or nonstationary, 
it is emphasized for only a stationary random process and only the 
all-pole model is considered [6] . In the all- pole model, we 
assume that the signal x(n) is given by as a linear combination 
of past values and some input u(n) : 

P 

x(n) = - £ A u x(n-k) + Gu(n) (2-38) 

k=l K 

where G is a gain factor. 

Here, it is assumed that the input u(n) is totally unknown, 

which is the case in many applications. Therefore, the 

signal x(n) can be predicted only approximately from a linearly 

weighted summation of past samples. 

Let this approximation of x(n) be ^(n), where 

x(n) = - y Av x (n-k) , (2-39) 

k=l K 

then the error between the actual value x(n) and the predicted 

value x(n) is given by 

A P 

e(n) = x(n) - x(n) = x(n) + £ Avx(n-k) . (2-40) 

k=l K 

e(n) is also known as the residual. In the method of least 
squares, the parameter A^'s are obtained as a result of the 
minimization of the mean square error with respect to all of 
the parameters. 
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If the signal x(n) is assumed to be a sample of a random 
process, then the error in equation (2-40) is also a sample 
of a random process. In the least square method, we minimize 
the expected value of the square of the error. 



E [e 2 (n) ] = E{ [x (n) + l A v x(n-k)] 2 } 

k=l K 



E[e (n) ] is minimized by setting 



3E[e^(n) ] 

3A i 



- . < . < 

= 0, 1 - 1 - p 



(2-41) 



(2-42) 



From (2-41) and (2-42) we obtain the set of equations 



£ Ak E((x (n-k) x (n-i) ] = -E [x (n) x (n-i) ] 1-i-P (2-43) 

k=l 

Then the minimum average error is given by 

^ = E [x 2 (n) ] +? AkE(x(n)x(n-k) ) (2-44) 

k=l 



For a stationary process x(n), we have 
E [x (n-k) x (n-i) ] = R xx (i“k) 
where I^ x (i) is the autocorrelation of the process. 

Note that equations (2-42) and (2-44) lead to the wall known 
orthogonality principle [7] . Since in the least squares 
method, we assumed that the input is unknown, it doesn't make 
much sense to determine a value for the gain G. However, 
there are certain interesting points that can be made. 
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1 






Equation (2-39) can be written as 



x(n) = - y A.x(n-k) + e(n) 
k=l K 



(2-45) 



Comparing (2-45) and (2-38) , it is seen that the only input 
signal u(n) that will result in the signal x(n) as output is 
that where Gu(n) = e(n). (2-46) 

That is, the input signal is proportional to the error signal. 
e(n) can be also considered as the modeling error. The error 
variance can be calculated by Equation (2-41) and the filter 
coefficient A^ (k=l . . . p) would be calculated by equation 
(2-43) if the correlation function of process x(n) is available 

At this moment, it should be recalled that a linear random 
process is generated by linear filtering of white noise. Therefore, 
we are interested in white noise inputs for the purpose of 
modeling a given linear random process. That is, the 

input u(n) is assumed to be a sequence of uncorrelated samples 
(white noise) with zero mean and unit variance . 

E[u(n)] = 0, E[u(n)u(m)] = <5mm 

Then the output x(n) forms a stationary random process 



x(n) = - y A v x(n-k) + Gu(n) 
k=l k 



(2-47) 



Multiplying equation (2-34) by x(n-i) and taking the expectation, 

E [x (n) x (n- j ) ] = - ? A k E [x (n-k) x (n-i) ] 

k=l K 

+E [Gu (n) x (n-i) ] (2-48) 

Noting that u(n) and x(n-i) are uncorrelated for i>0 and re- 
calling that for stationary process, E [x (n) x (n-i) ] =R (i) , 

XX 



X 






i 



Equation (2-35) turns out to be 



P 




k=l 



(2-49) 



and R(o) can be obtained by plugging x(n) of Equation 



(2-38) into Equation (2-48) 



P 



2 



R xx (o) = - * A k ^(k) + G 



k=l 



(2-50) 



Therefore, the gain can be given by 





(2-51) 



It is noted that through Equation (2-46), that is Gu(n)=e(n), 
the white noise input of zero mean and unit variance generates 
the random process e(n) , which is again white with zero mean 



cursive filter coefficients A k ,(k=l, ....p) can be calculated 
and using these valu eg the gain G would be calculated by Equation 
(2-51) with the knowledge of autocorrelation function of a given 
class of linear random process. So far, modeling of one- 
dimensional stochastic process has been considered. Similar 
reasoning can readily be extended to the modeling • of two- 
dimensional random fields. Again, the two-dimensional all- pole 
model is considered. 



2 

and variance of G . Therefore, from Equation (2-49) , the re- 



Mi M2 

x(k, £,)=-£ i A- -x(k-i,£-j)+Gu(k,£) 

i =r\ -i =/-> 



1=0 3=0 
(i« j) 7* (o,o) 



(2-52) 
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Let's define the set ft(k,£) such that for all i,j 
(k-i, (L-j) efi(k,£) , 

all the values of x(k,£) in fl(k,£) will be used to estimate 
the point x(k,£). 



ji-m 2 

«— 0 - ♦ # 








FIGURE 2-2 
DEFINITION OF ft(k,£) 
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Again the coefficients A^j will be determined such that the 

mean squared error is minimized. The estimate of x(k f £) , 

A 

x(k*£), is given by a linear combination of the previous values. 
A 



x(k,£) = - E^Ex (k-i , £-j ) (2-53) 

The mean squared error is 

E [e 2 (k, 2.) ]=E{ [x (k, £) - x ( k , Jl ) ] 2 } (2-54) 

If E[e 2 (k,£)] is minimized, k,2) is the "linear 
least squares estimate" of x(k,2). 

Going through the same procedure as in the one-dimensional 
case, that is, substituting (2-53) in (2-54) and differentiating 
with respect to each A^ ^ , setting derivatives equal to zero, we 
obtain the following set of simultaneous equations for the 
unknown A^j 

E{ [x (k, A) -x (k, A) ] x (i, j ) } = o for all(i,j ) efl! , (2-55) 

which says that the coefficients A^j must be such that the 
estimation error [x (k, SL) (k, 2,) ] is statistically orthogonal to 
each x(i,j) that is used to form the linear estimate. This 
is known as the orthogonality principle in linear least square 
estimation. 

The modeling error is the difference between the true 

value x(k,2,) and the estimate x(k,2). By definition, 

A 

e(k,A)=x(k,£) -x(k,£)=Gu(k,A) (2-56) 

from the equation (2-52) and (2-53) . 

Again, we are interested intbevhite noise field of zero 
mean and unit variance. Then the modeling error is also a 
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random field. 



Rewriting the Equation (2-52) in terms of the error e(k,£) gives 
x(k,£)=-Z^Z A^jX (k-i , £-j ) +e (k, £) (2-57) 

To calculate the error variance, multiply (2-57) by x(k-m,£-n) 
and take the expectations 

E [x (k, £) x (k-m, £-n) ] =- Z^ZA^ jE [x (k-i, £-j ) x (k-m, £-n) ] 

+E [e (k, £) x (k-m, £-n) ] (2-58) 

For the stationary process, 

E [x (k-i , £-j ) x (k-m, £-n) ] -R^^Jm-j, n-j ] , then (2-58) will be 
R^Cmjn) =-Z fi ZA i jR Kx tn-i,n-j) for allfm, n)eft. (2-59) 

The second term on the rlghtside of (2-58) is zero, because of the 
orthogonality principle and R(0,0) can be obtained by the 
following equation: 

I^b T 0)=-Z fi Z A i;j ]^ x (i,j)+G 2 (2-60) 

Therefore the modeling error variance, 

E [e 2 (k,£) ] =E{ [ (Gu (k 7 £) ] 2 } = G 2 is given by the equations 
G 2 =E [e 2 (k,£) ]=^ K (0, 0)+Z^Z and the mean of error 

e(k,£) is E[e(k,£)]=E[Gu(k,£)]=0 

Again with the knowledge of autocorrelation function of the 
two-dimensional stationary random field, the filter coefficient 
A^j can be obtained from the Equation (2-59) and using these 
values of A^ j , the gain G in Equation (2-52) can be calculated 
by Equation (2-60) . 

Example 1 Consider a one-dimensional stationary band limited 
random process forvhich the autocorrelation function is given by 
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R J?x (m) = P ,rnl cos (w 0 m) 

Find a model which describes this process. An 
all-pole model is chosen such that 
P 

x(n) = - £ A.x (n-k) +Gu (n) 

k=l 

where E [e (n) ] = 0 

E [u (n) u (m) ] = <Smn 

One has to choose the crder of the difference equation; 
order model (p=2) is chosen. Then 
x (n) =-A^x (n-1) -A 2 x (n-2) +Gu (n) 

The problem shrinks down to that of finding A^ , A 2 , 
given R (m) . 

XX 

From Equation (2-4 9) 

iy.i) = - \ A^R^i-k) 

k=l 

1^1) - - A^O) - A-,%-1) 

U 2) ‘ - A l& 1 > - A 2% 0) 

Putting this in matrix form 



' ' 




/ s 

A 1 




^ 'I 


1 

& 

1 

& 

O 




* A 2 / 




- R A 2) - 



From Rxx(m) = p' ml cos (w q m) 



\ 


/ N 




N 


-1 - p cos w 0 




A 1 




p cos w 0 


- Pcos w -1 
1 0 / 




> 

N) 

w. 


— 


cos 2 w 0 



Therefore fi,, ~ .. \ 

n l = P cox w„ (l-p z cox 2 wo) 

i r>2 2 

1 - P COS Wo 



here, a second 



G with the 



32 



a _ P 2 sin 2 w <> 

n 2 ~ 2 2 

1 - P^cos^ w 0 

From Equation (2- 51 ) 



G = R (0) 



♦ i 



XX- k=1 A k R xx (k) 

- 1 + A l\x (1)+ A 2 R xx (2 > 

= 1 + A-j P cos w 0 + A 2 P 2 cos 2 Wo 
_ 1 ^CQS^WO (1-P^COS 2 w°) 

1 - p2 cos 2 Wo 



+ p 4 sin z w 0 cos 2 w < 



1 - P 2 



cos^ w. 



Example 2 Givena sbationary two-dimensional random field with 
Rxx (m, n) = jo |m| p|n| , 

find the autoregressive model of this random field (most mono- 
chromatic images can be assumed to have this form of auto- 

correlation function). A first-order model (M = 1, M 2 = 1) is 
chosen. Then equation (2-52) can be written as follows: 
x(k,£) = -s A 10 x(k-1,£) -A 11 x(k-1, £-1 ) -Aq^x ( k, £ 1 ) +Gu(k,£) 

Where u(k,£) is white noise with zero mean and unit variance.. 
Then, using Equation ( 2 - 59 ) 



1 1 

R^,m,n)=- i ^ Q A i jl^|m-i / n-j) for all(m, n fi . 

(if j)^(o.O) 

Rxx (1*0) = - A 1 q R xx ( 0,0) - A n R xx (0 -1) -A^Rxx (1 ,-1) 

Rxx(l*l) = - A, .R (0,1) - A. . R (0.0) -A r ,.Rxx(l/ 0) 

10 xx 11 xx 01 

Rxx (0,1) = - A 10 R xx (-l,l)- A 11 R xx <-1.0)-Aol R xx , °' 0) • 
Putting thisin matrix form gives 



/■ 






^ N 




/" N 


-Rxx .(0,0) 


-Rxx (0,-1) 


-Rxx (1,-1) 




O 

rH 

< 




Rxx (1,0)? 


-Rxx (0,-1) 


-Rxx ( 0 , 0) 


-Rxx (1,0) 




A 11 


= 


Rxx (1,1) 


-Rxx (-1,1) 


-Rxx (-1,0) 


-Rxx (0,0) 




. A 01 < 




Rxx (0,1)^ 



33 



For thegiven auto correlation function above 



/ 

-1 -p 


-p 2 " 




/ v 

A 10 




\ 


1 — 1 

1 

1 


-p 




A 11 


= 


P 2 


W 2 -f 


-1 / 




. A oi, 




p 

k / 



which yields 



A 01 P 



'll 



= P : 



A 10 f 

2 

The modeling error variance or the square of the gainG can be cal- 
culated by Equation (2-60) 

2 1 1 
G =H xx (0,0) + . = E jE o A ij R x x(i,j) 

(ir j)^ (0,0 

= 1 + A 01 R xx (0,1)+ a h r xx (1^ + a 10 R xx (!/°) 

= (1-P 2 ) 2 

P I m I gl ^ I 

r is 

x(k,£) = px(k,£-l) + p Z x( k, £) + f>x (k-1, SL) + (1-f 2 ) u (k, £) 
where E (u(k.£) ) = 0 

E[u(k.£) u (k-p , £-q) ] = &pq. 

3 . Method of Filter Response 

Another method of modeling a linear stationary random 
process is based on the concept thatalinear random process is a 
result of filtering white noise through a linear filter. In 
Section B. of this chapter, the properties of linear systems 
have been discussed. 
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For the discrete system, it is known that the filter output 
power spectral density is the Z transformation of tte output correla- 
tion function. That is, 

Gyy(Z) = 3[R yy (m)] 

Gyy ( Z i , Z 2 ) = 3 [R yy (m,n)] 
and also it is noted that 

Gyy (Z) = Gxx(Z) H (Z) H (Z _1 ) 

- 1 _i 

Gyy(z x , z 2 ) = Gxx ( z i , z 2 ) h (z , zj h (z , Z 2 ), 
where Gxx(Z), Gxx(Zi, Z 2 )is the input power spectral density 
function and H(Z), H (Zi, Z 2 ) is the transfer function of the 
filter. Denoting the white noise input as u(n)oru(k,£) and 
the output of filter, which is a linear stationary random pro- 
cess we are going to model as x(n)orX(k,£) , the problem 

can be stated as follows: For a given autocorrelation function 

ofaiinear random process (field) R(m) , (R (m,n)), find a linear 
system such that when the input is white noise, the output of the 
filter gives a given autocorrelation function R(m) , (R(m,n)). 




x(k,i) 

Given *«<»>. R xx (m ' n) 



35 



If the input is white noise, then 
C UU (Z) = Const, 

G uu ( Z i , Z 2 ) = Const t 

Therefore, the solution for the required filter is to find a 
function H(Z) or H(Z^f Z 2 ) which satisfies 
G ( Z ) = Const -H(Z)H(Z -1 ) 

XX 

G xx ( Z 1 , z 2 ) = Const-H^, Z 2 )H (Z-^ Z 2 -1 ) (2-62) 

where G V (Z) , G VV (Z., Z~) is known through the relation 

XX XX JL 4 

g xx < z) - 3 [r xx ""> ] 

G xx' Z l' Z 2* ■3I R xx (m ' n) l 
since R (m) , R (m,n) is given. 

XX XX 

Batfbrthe two-dimensional case, there is an inherent difficulty 
in factorization of G xx (Zp Z 2 ) to H(Z^, Z 2 )H(Z^ , Z 2 ^) • 
Therefore, only separable functions can be modeled by this 
technique . 

Example 1 For a given stationary linear random process with 
R (n) = 6‘ 2 p^ m * cos (w 0 m) , m=o, 1, 2 . . . 

Find a difference equation which will give a random process 
with autocorrelation above. 



G xx (Z) = 3 [R xx (m) 1 

_ < 5* 2 (1-|° 2 ) . [-Z |°COS W 0 +(l+i° 2 )-Z ^ cox w 0 ] 
(l-2pZcos w 0 +p 2 Z 2 ) (l-2p>Z -1 cos w 0 +p 2 Z -2 ) 



(2-51) 



The second step is to find a factored expression for G (Z) ,i.e. 

XX 

- 1 , 



G xi Z) = G 1 (Z) * G 1 (Z } 
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Assuming that G (Z) has the form 

XX 



G XX (Z) 



az ^ + b 



az + b 



(2-52) 



1-2|°Z ^cos w 0 +^z 2 l-2pz cos w 0 +p 2 z 2 



Comparing (2-52) and (2-51) , a and b can be obtained as 
a = j ( 1-f cos w 0 +p 2 + l+/° cos w 0 +p 2 



b = 



1-p COS Wo+p 



1+ p cos w 0 +p^ 



From equation (2-50), if we choose H(Z)= 



az’*' + b 



l-2|°z '''cos w 0 +p 2 z 2 



2 2 

then the term <3 (1-p ) in Equation (2-52) can be considered 

as G (Z) 
uu 



Therefore 



(l-f 2 ) d 2 if m=0 



V 1 " 1 * 



0 if m^O 

and the complete model is drawn block diagram form in Figure 2-4. 




FIGURE 2-4 

FILTER FOR ONE-DIMENSIONAL BAND PASS PROCESS 
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lb put the system's input-output relation into state equation 
form, it is defined for convenience as 
u(z) = z' 1 w (z) 

where w(n) is also white noise with same autocorrelation 
function 

R(m) = (1-P 2 )cf 2 if m=0 

WW 1 

0 if m^O 



then the transfer function is 

X(Z) 



H(Z) = 

D ef ining 
X 1 (z) 



X 2 (z) 



az _1 + b 



U(Z) 



= -p 2 z 1 X 2 (z) 



U ( z) 



l-2pZ ^cos w 0 +p 2 z 2 



-1 2 -2 

1-2 p cos woz +p z 



X(z) 



= (az 1 +b)X 2 (z) 



then 



and 



X 1 (n) = -p x 2 (n-l) 

X 2 (n) = x 1 (n-l) + 2p cos w 0 x 2 (n-l) + w(n-l) 

—2 

X(n) = ax 2 (n-1) +bx 2 (n) = -p ax(n)+bx 2 (n) 



Putting the state and output equations in matrix form gives 

' .2 



x 1 (n) 
x 2 (n) 



0 - p 

1 2 cosw 0 
-2 



X x (n-l) N 


+ 


/• N 
0 


x 0 (n-1 
z J 




.1 , 



w (n-1) 



x (n) = (-p a b) 



x 1 (n) 
x„ (n) 
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Example 2 The two-dimensional "band limited" discrete 



Markov process is defined by the autocorrelation function 

R^d^n) = cos (w f m) ] [ ?2 (m) cos (w 2 n) ] „= 0 1 1 1 2 1 1 1 

the discrete power spectral density 



then 



G xx (z l' Z 2 ) = o 2 d-)° i 2 ) d-P 2 2 ) A(Z 1 / Z 2 } 

B(Z lt Z 2 ) 

where 



A(Z 1 , Z 2 ) = [-Z 1 Cosw^+ (1+yD^ 2 ) -z^ 1 CosW^] [-Z 2 CosW 2 + (l+yJ 2 2 ) -Z 2 1 Cos w 2 ] 

B(Z lf Z 2 ) = [ (l-2p^z^cos Wj+^zJ) (l-2p 1 z~ 1 cos ^p^z" 2 ) ] 

[(l-2p 2 z 2 cos w 2 +p 2 z 2 ) (l-2p 2 z 2 1 cos w 2 +p 2 z" 2 ) ] 



Putting A(Z^, Z 2 ) in the following form 



ACZ^ Z 2 ) = [ (a L ,z 1 1 +b 1 ) (a 2 z 2 +b 1 ) ] [ (a 2 z 2 1 +b 2 ) (a 2 z 2 +b 2 ) ] 

and comparing this equation and above A(Z^Z 2 ) ; a^, a 2 , b^^ , b 2 
obtains 



a l Jl-^cos w 1 + p x 2 



♦ / 



1 + p^COS W^+p^ ) 



. 1 



a 2 = 7< J l-p 2 cos w 2 + P 2 + 



p 2 COS w 2 +p 2 ) 



b l* 7< 



Jl-f 



J 1 + 

Jl + piCOS W l + fl^ 



p^cos w^ + p^ 

b 2 = l ( J 1 ’^ 003 w 2 + ( p 2 2 " J 1 + P2 COS W 2 + P2 f) 



Let H ( Z^ , Z 2 ) = (a 1 z 1 1 +l^) (a 2 z 2 1 +b 2 ) 



(l-2p^z“ 1 cos w^ 4 ) 022 ! 2 ) (l~2p 2 z 2 1 cos W 2 + P2 Z 2 2 ^ 
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then from (2-50) 



R (m,n) = 
uu ' 



0 l2 (l-p i 2 ) (1-P 2 2 ) 



if n=o, m=0 
if n^O, m^O 




jnl 



It is convenient to define. 

U(Z 1 , Z 2 ) = Z 1 " 1 Z 2 " 1 W(z 1/ z 2 ) 

W(k,&) is also white noise with the same statistics as w(k,£) 

R ww (n,m)= ( <5*; 2 ( i~p 1 2 ) (i-f 2 2 ) if n=0 and m=0 
0 if n^O or m^O 

Then, the filter has the form 

X(Z 1 , Z 2 ) = ( a 1 z i 1+ b) ( a 2 z 2~ 1+ b 2 ) Z l _lz 2 _1 



w(z 1# z 2 ) 



( 1 — ^ 



z , b cos w 



l + j°l Z l } (1 2 P2 Z 2 COS W 2 + P 2 Z 2 * 



The following definitions are made: 
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N 1 (Z 1' Z 2* 
N 2 (Z 1' Z 2 ) 



x l< z l' Z 2> " 



-plVS< Z l' Z 2> 

Z->-lw ( Z , Z ) 

£ 1 2 

l-2j 3 1 z 1 “ 1 cos w 1 + p±z^~ 

( a l z l^ + b l )N 2 (Z l' Z 2 } 



From the last three definitions one can write the set of 
difference equations 



s \ 

N x (k,£) 




/ 

0 


"Pi 2 


\ 

N ] _(k / £-l) 


+ 


r \ 
0 


N,(k,£) 
z J 




. 1 


2/^COS w ^ 


N 2 (k,£-l) ^ 




s 

1 — 1 

/ 



w (k, £-1) 



x^k,*) = -p 1 a.^ (k, £) +b 1 N 2 (k, £) 
Now, additional definitions are made. 

W Z 2> - -P 2 2 z 2~ 1h 2< z V z 2> 

M 2 (Z 1 , Z 2 ) = 2 1 lx l (2 1' Z 2 } 



-1 2-2 
l-2^ 2 z 2 cos w 2 + p 2 z 2 



From these definitions it follows that 

X(Z 1 , Z 2 ) = (a 2 z 2 -1 + b 2 ) M 2 (Z x , Z 2 ) 



(2-55) 



Then 



> 




f 2 > 




r > 


r n 


M x (k,£) 


— 


0 ~?2 




M x (k’l,£) 


+ 


0 


(k, £) 

\ Z / 




1 2P 0 cos w_ 

^ ' z Z J 




M (k-1, A) 
^ z J 




1 

^ / 



x 1 (k-1, £) 



X (k, £) = -/° 2 a 2 M l (k ' £ ) + b 2 M 2 (k,£) (2-56) 

Combining these all together , the following form is obtained. 



41 



M 1 (k+1,«.) 

M 2 (k+l,J.) 

N 1 (k,i+1) 

N 7 (k,£+l) 

^ z s 



S 

0 

1 

0 

0 



1 

-“b 

N> 

fO 

O 

o 

/ 


M^k,*.)' 




0 


2p2 c °s w 2 ~pi a i 


M 2 (k,£) 


t 


0 
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TUo methods of modeling have been discussed. Some of the above 
examples will be used in later chapters. It should be noted 
again that the filter response method cannot be used for the 

case where the autocorrelation function is nonseparable . 
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III. ADAPTIVE FILTERS 



A. NONRECURSIVE FILTER 
1 . Introduction 

Many forms of adaptive filters have been described in 
the literature, some of which have been shown to be optimal 
(or suboptimal) in certain applications. The special form 
of an adaptive nonrecursive filter developed by Widrow [1] 

is reviewed here to give some insights to the recursive adap- 
tive filter developed in next section. 

The filter to be considered here consists of a tapped 
delay line, variable weights whose input signals are the signals 
at the delay line taps, a summer to add the weighted signals, 
and machinery to adjust the weights automatically. The impulse 
response of such a discrete system is completely controlled 
by the weights. The adaptation process automatically seeks an 
optimal filter impulse response by adjusting the weights. 

Two kinds of processes take place in the adaptive filter: 
training and operating. The training (adaptation) process is 
concerned with adjusting the weights, and the operating process 
consists in forming output signals by weighting the delay line 
tap signals, using the weights resulting from the training 
process. During the training process, an additional input 
signal, "the reference (or desired) response," must be supplied 
to the adaptive filter along with the usual input signals. 
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This requirement may in some case restrict the use of 
this particular form of adaptive filter. An example illustrat- 
ing the use of the desired signal is the case of modelling an 
unknown system by a discrete adaptive filter as shown in Figure 
3-1. Here a discrete input signal x(n) is applied to an un- 
known system to be modeled. The discrete adaptive model is 
supplied with an input x(n). The output of the unknown system 
d(n) is compared with the output y(n) of the adaptive system. 
This system can self-adapt to minimize the mean square error, 
(throughout this thesis, the mean square error is chosen 

as the performance measure), where the error is defined as the 
difference between the output of the adaptive model and the 
desired signal (for this problan the desired signal is the output 
of the unknown system to be modeled) . 




FIGURE 3-1 MODELLING OF UNKNOWN SYSTEM 
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Then 



N 

Y(n) = i | Q woe (n-i) 



(3-1) 



e (n) = y (n) - d(n) 



(3-2) 



Noting that equation (3-1) is the convolution summation, the 
sequence of weights(wi) can be seen asthe impulse response 
of the adaptive systsn. 



It will be shown that if the input and output signals of the 
system being modeled are stationary, the error signal has a 
mean square value which is a quadratic function of the weight 
settings . 

For the minimization of mean square error, the steepest 
descent method is used. Throughout this thesis , the terms 

"filter coefficient updating process" and "adaptation process" are 
used interchangeably and it is assumed that the input to the 
adaptive system and the desired signal are stationary random 
processes (or random fields for the two-dimensional case). 

2 . Performance Surface, Gradient and the Wiener Solution . 

The input signals are weighted and summed to form an 
output signal [Equation (3-1) ] . Introducing the vector notation 
such that W^j = [w Q , w 1 , w 2 . . . w N ] 



2 

The weights w^ are adjusted to minimize E(e ) 



and 



= [x(n), x(n-l), x(n-2) . . . x(n-p)] (3-3) 
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Then E quation (3-1), which describes the linear combination 
(operating process), can be written in matrix form. 

y(n) = vF x = x^W (3-4) 

and e(n) = y(n) - d(n) 

= W T x - d(n) (3-5) 

T.he square of this error is 



e 2 (n) = "TFx x'Sr - 2d(n) fiFx + d 2 (n) (3-6) 

2 

the mean square error, the expected value of e (n) , is 
E [e 2 (n) ] = d 2 (n) - 2#I^ d + fF W (3-7) 

where the vector of cross-correlation between the input signals 
and the desired response is defined as 



E [d (n) X (n) ] = E 



d (n) x (n) 
d (n) x (n-1) 



d (n) x (n-N) 



xd 



(3-8) 



and where the correlation matrix of the input signal is defined 



as 

E [X(n) X T (n) ] = E 



N 



X (n) X (n) X (n) X (n-1) . 

X (n-1) x (n) X (n-1) X (n-1) 



l 



. X(n) X(n-N) 

. X (n-2)X (n-N) 



X (n-N) X (n-N) 



A 

=R 



xx 



(3-9) 
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It may be observed from (3-7) that for stationary input signals, 
the mean-square error is precisely a second order function of 
the weights. The mean square error performance function may be 
visualized as a bowl shaped surface, a quadratic function of the 
weight variables. Then the adaptive process has the job of con- 
tinually seeking the "bottom of the bowl." A means of accomplish- 
ing this by the well-known method of steepest descent is dis- 
cussed below. 

In the non- stationary case, the bottom of the bowl may be 
moving, and the orientation and curvature of the bowl may be 
changing. The adaptive process has to track the bottom of the 
bowl when inputs are nonstationary. The method of steepest 
descent uses the gradient of the performance surface in seeking its 
minimum. The gradient at any point on the performance surface 
may be obtained by differentiating the mean-square error function 
of Equation (3-5) with respect to the weight vector. The 
gradient is 



To find the "optimal" weight vector W LM g, .i.e. the one that yields the 
least mean square error, set the gradient to zero. Accordingly, 



V (E (e 2 (n) ) }= -2 R xd + 2 R xx W 



(3-10) 




R XX W LMS 



R. 



-1 



R. 



(3-11) 



LMS 



xx 



l xd 



Equation (3-11) is known as the Wiener-Hopf equation in 



matrix form. 
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Then the minimum mean square error may be obtained by 
substituting (3-11) into (3-7) 

E [e 2 (n) ] min = cT 2 (n) - R xd (3-12) 

3 . LMS Algorithm 

In seeking the minimum mean-square error by the 
method of steepest descent , one begins with an initial 
guess as to where the minimum point of the mean-square error 
surface may be. This means that one begins with a set of initial 
conditions for the weights. The gradient vector is then mea- 
sured, and the next guess is obtained from the present guess 
by making a change in the weight vector in the direction of the 
negative of the gradient vector. The method of steepest descent 
can thus be described by the following relation 

W(n+1) = W(n) + kV[E(e 2 (n))] (3-13) 

2 

The expression for V[E(e (n) )] is obtained by using 

Equation (3-10) . 

W(n+1) = W(n) + 2kR xx W - 2k R xd (3-14) 

2 

The gradient vector V[E(e (n))] is the gradient of the expectation 
of the squared error function when the weight vector is W(n) . 

When the performance function is quadratic, the gradient 
is a linear function of the weights. The advantage of working 
with the quadratic performance surface lies both in this linear 
relation and in the fact that such a surface has a unique- 
minimum . 
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The purpose of the adaptation process is to find an exact or 
an approximate solution to the Wiener -Hoff equation (3-11) . 

One way of finding the optimum weight vector is simply to solve 
(3-10) . Although this solution is generally straight forward, 
it could present serious computational problems when the num- 
ber of weights N is large and when input data rates are high. 

In addition to the necessity of inverting an H x N matrix, this 
method may require as many as f\l(N+l)/2 autocorrelation and cross 
correlation measurements to be made to obtain the elements of 

R xx' R xd * 

No perfect solution of equation (3-11) is possible in prac- 
tice to estimate perfectly the elements of the correlation 
matrices . 

A method for finding approximation solutions to (3-11) is 
presented below. The accuracy of this method is limited by 
statistical sample size, since weight values are found that 
are based on finite-time measurements of input-data signals. 

This method does not- require explicit measurements of 
correlation functions, nor does it require matrix inversion. 

It is the "LMS" algorithm based on the steepest descent method. 
This algorithm does not even require squaring, averaging, or 
differentiation in order to make use of gradients of mean- 
square error functions . 

When using the LMS algorithm, changes in the weight vector 
are made along the direction of the estimated gradient vector. 
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Accordingly 

— — A 2 

W(n+1) = W(n) + kV{[E(e^(n) ) ] } (3-13) 

Where 

W(n) = weight vector before adaptation 
W(n+1) = weight vector after adaptation 

k = scalar constant controlling rate of convergence 

and stability (k<o) 

A 2 2 

V[E(e (n))]= estimate of gradient of E[e (n) ] with respect 
to W with W = W(n) 

One method for obtaining the estimated gradient of the 
mean square error function is to take the gradient of a single 
time sample of the squared error; that is 

V [E ( e 2 (n) ] = V [e 2 (n) ] = 2e(n) V[e(n)] (3-14) 

From Equation (3-4) 

V [e (n) ] = V[y(n)-d(n)] =V [W T (n) X (n) -d (n) ] 

= X(n) (3-15) 

Thus , 

V[E(e z (n))] = 2e(n)X(n) 

= 2 [W T (n) X (n) -d (n) ] X (n) (3-16) 

The gradient estimate of (3-16) is unbiased, as will be shown 
by the following argument: For a given weight vector W(n) , the 

expected value of the gradient estimate is : 

EtV [E (e 2 (n) ) ] } = 2E{ [W T (n) X (n) -d (n) ] X (n) } 

= 2 V- 2R xd ( 3 - 17 > 
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(3-18) 



Comparing (3-17) and (3-10) , we see that 




and therefore for a given weight vector, the gradient estimate 
A 2 

V[E(e (n) ) ] is unbiased. 

Using the gradient estimation formula (3-16) , the weight 
iteration rule Equation (3-13) becomes 



and the next weight vector is obtained by adding to the present 
weight vector scaled by the value of error. This is the LMS 
algorithm. Looking at Equation (3-19) , the adaptation process 
is a simple first order recursion equation which can be realized 
as shown below. 



W(n+1) = W(n) + 2k e(n) X (n) 



(3-19) 




WEIGHT 

SETTING 



X(n) 



e(n) 

FIGURE - 3-2 FILTER COEFFICIENT UPDATING 
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Tbthis point, the basic concept of LMS nonrecursive adaptive filter 
has been introduced and reviewed. More details can be seen in [1] . 
Widrow [1] showed that the weight vector mean converges to the 
Wiener solution aidthat the bound s of ihe step size k should be in the 
region such that 

- — < k < o for the stability and convergence, 

max 

where A is the maximum eigenvalue of R . 

max xx 

4 . Two-dimensional adaptive filter , 
a. Adaptive filter structure 

The input-output relation of the two-dimensional 
filter is given by two-dimensional convolution. 

P *3 

y (k, Jt) = I 2- w ii x(k-i, £- j ) , (3-20) 

i=o j=o J 

where y (k, £) is the filter output 

and is the firite impulse response of filter. 

Here, it is assumed that the input. X(k,£) is a stationary random 
field. 

In Equation (3-20) , a set of two-dimensional 
stationary input signals is weighted and summed to form an out- 
put signal and the filter output is intended to match a desired 
(reference) signal in accordance with the minimization of mean 
squared error, where the error is the difference between filter 

output and desired signal. 

A 

e(k,£) = y(k,A ) - d(k,A). (3-21) 
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Introducing the vector notation such that 



W 1 = [W 



oo 



W 



ol . 



W_ W 



W, 



oq "10 "11 



. W 



lq W 20 



.W ] 

pq 



and 

X T = [x(k,£), x(k,£-l), . . .x (k, £-q) x (k-1, £) x (k-1 , £-1) . . .x (k-1 , £-q) 

x(k-2,£) x(k-p,£ -q) ] (3-22) 

then Equation (3-20) can be written in matrix form. 

y(k,£) = = j^W (3-23) 



where W is a weight vector of dimension (p+1) (q+l)xl 

X is a input signal vector of dimension (p+1) (q+l)xl 
The weight vector of the filter is supposed to be adjusted in 
the direction such that performance criterion (mean square error) 
is to be minimized. Thus, the linear combinatorial systan 
in Equation (3-20) will be given with variable weights. 




two-dimensional adaptive filter 
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In Figure 3-3, the linear ccmbinatorial structure is given. Then, 
the adaptive nonrecursive filter can be drawn as following: 




FIGURE 3-4 STRUCTURE OF NONRECURSIVE ADAPTIVE 
FILTER 



b. Wiener solution 

From equation (3-22) and (3-23) , the error signal 
can be written by 

e (k, £) = V^X - d (k, £) (3-24) 

The square of this error is 

e 2 (k, l) = W T XX T W - 2d (k, £) X T W + d 2 (k,£) (3-25) 

2 

The expected value of e (k,£) is 

E[e 2 (k,£) ]=E[d 2 (k,£) 1 -2R xd W T +w r R xx W (3-26) 

where the vector of cross correlations between the input 
signal and desired response is defined as 
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d(k, £)x(k, £) 
d(k, £) x (k, £-1) 



r 



E[d(k, A) 3H = E 



d (k, £) x (k, £-q) 
d (k, £) x (k-1 , £) 




d (k, £) x (k-1 , £— q) 



(3-28) 



^ d (k, £) x (k-p, £-q)^ 

and where the correlation matrix of the input signals is 



defined as 
E[5 oT T ] = 



x fl 


X 1 X 2 x 3 • 


x i x (p+q) (q+D 


^1 


X 2 X 2 x 2 x 3 " 


^(p+l) (q+l) 


x 3 x i 

• 


X 3 X 2 x 3 X 3 ’ 

• -s 


^(p+l) (q+l) 

• -• • 


fe+3) 3 


• • • 

c 2 x (.p f-1) (q+l x (p+D (g+1 



(3-29) 



=R xx' where v^cq+jr * k -p»i-q> 



55 



The gradient at any point on the performance surface can be 
obtained by differentiating the mean square error function of 
equation (6) 

V [E (e 2 (k, £) )] = . JL £ (e 2 Jk,A) 

9 w 



- - 2R xd + 2E xx W 

To find the "optimal" weight vector W^g that yield the least 
mean square error, set the gradient to zero. Accordingly, 



R. 



xd 



R xx W 



W 



LMS 




(3-30) 



Equation (7) is the Wiener K.opf equation in matrix form, again 
the minimum mean square error is obtained by substituting 

(3-30) into (3-26) . 

E[e 2 (k,t)] min = E[d 2 (k,M] - W lms T R xd (3-31) 



c. LMS Algorithm 

Consider a two-dimensional field x(k,£) to be processed 
(usually two-dimensional filters are used in process - 

ingdiscrete two-dimensional image fields) and assume that the 
two-dimensional field consists of NxN discrete points (which may be a 
sensed signal by NxN pixel elements of a sensor) . The adaptation 
processes is that of adjusting the filter coefficient W • ■ in 
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accordance of minimization of the mean square error as in the one- 
dimensional case. The adaptation scheme may be predetermined as 
being columnwise scanning or diagonal, or row-wise scanning. 

Here the row scanning process is adopted as shown in Figure 3-5. 
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Therefore,theNxNadaptation processes will be required to complete 
the filtering of anNxN two-dimensional field. Using e(k,£) 
to denote ' the error at the jth iteration (then e(k+l,£) is 

the (j+l)th error), then the filter coefficient updating process 
can be described by 

W(j+1) = W(j) + uV [E (e 2 (k, £) ] 

where W(j+1) = coefficient vector after adaptation 
W (j) ^ coefficient vector before adaptation 

u - negative scalar constant controlling rate 

of convergence and stability. 

The gradient of the mean square error is to be estimated by 
V [E (e 2 (k, £) ] = V [e 2 (k, £) ] 
where e(k,&) = y(k,£) - d(k,£) 
then W(j+1) = W(j) - ue(k,£)V[e(k,£)] 

= W ( j ) - 2ue (k, £ ) X 

where X is defined by equation (3-22) . 

Along with y(k,£) = W^X 

and e(k.£) = y(k,£) - d(k,£), 
the LMS algorithm will be completed. 

B. RECURSIVE FILTER 
1 . Introduction 

In the previous section, it is shown that adaptive non- 
recursive filters have a finite impulse response? that is, they 
can produce only zeros with no poles in the filter transfer 
function. This limits the capability of transversal adaptive 
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filters in many applications. To overcome this limitation a 
new adaptive filter structure is described which is capable of 
producing poles in the transfer function. The basic configura- 
tion considered here is quite standard; that is, the present 
output sample of the filter y(n) is a linear combination of the 
present and past samples of the input x(n) , x(n-l), . .x(n-M) 
and the past samples of the output, y(n-l), y(n-2) . . . y(n-N) . 
The present output sample of the filter is compared against a 
reference sample. The resulting error samples are used to adjust 
the filter parameters, feed forward gains and feed back gains to 
minimize some error function .Tbecne-dimensional recursive filter 
is developed first, then it is extended to the two-dimensional 
adaptive filter. 

Recently Feintuck [2] and White [3] have proposed a technique 
for making digital filters with zeros and poles adaptive. This 
development may enhance the possiblity of detaining accurate 
models for unknown systems. The new approach is developed into an 
algorithm. It employs the steepest-descent criterion for para- 
meter adjustment but it differs in the estimation of mean 
squared error gradient vector from Feintuck [2] and Widrow [1] . 

2 . One-Dimensional Adaptive Recursive Filter 
a. Structure 

The recursive filter is described by its transfer 

function 
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Y(Z) 

X(Z) 



M 



-i 

i=o 

N -i 

1 + 2 64 Z 

i=l (3-32) 

In the time domain, the input-output relation of the digital 

filter is given by 

M 'N 

y (n) = Z b • x (n-i) - Z a.y(n-i) (3-33) 

i=o i=l 

where y(n) = nth sample of the filter output 
x(n) = nth sample of the filter input 
a^ = feedback coefficients i = 1*2,. N 
b^ = feed forward coefficients i=0,l,2..N 
The output samples of the filter are intended to match those of 
a reference (or desired) signal d(n). in accordance with 

the minimization of some error criterion, the filter parameters 
a^, b^ will be adjusted at every iteration Thecpneral scheme of the 
adaptive recursive structure is given in Figure 3-6. The two 
finite length transversal filters areusedinthe forward path and 
feedback path to form the recursive filter of Equation (3 — 33). 
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d (n) 




FIGURE 3-6 ADAPTIVE RECURSIVE LMS FILTER 
USING TWO TRANSVERSAL ADAPTIVE 
FILTERS 



b • Problem in Wiener Solution 

Introducing vector notation for the signals and sets 
of filter coefficients we have 

^ a 2 r • • • • sN] 

® T = 1 b o' b l b M l 

X(n) T = [ x(n), x(n-l) . ... x(n-M)] 

Y(n) T = [ y(n-l), y(n-2) . . . y( n -N)] 
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Then 



Equation (3-33) can be written as 



y (n) = B^Xfn) -A T Y(n) 



(3-34) 



where A is the feedback coefficient vector (nxl) 

Bis the feed forward coefficient vector [ (M+l) xl] 

X (n)is the input signal vector at nth iteration [ (M+lxl] 

Y (n) isthscutput signal vector at nth iteration (Nxl) 

The performance criterion is again' minimum mean squared error, 
vhere tie error is the difference between filter output and desired 
signal (reference signal) . 

That is, the filter is used to estimate a desired waveform 
d(n) in a minimum mean square error sense. Assume that the 
observables are stationary and zero mean and let e(n) denote the 
error waveform at nth sample, then 



e (n) ^ y (n) -d (n) 

= B^X (n) — A^Y ( n) -d (n) 



(3-35) 



and the mean square error is 



E [e 2 (n) ] = E[(B T X(n)-A T Y(n)-d(n) ) 2 ] 

= E [B T X(n) X(n) T B-2B T X(n) Y^n) A+A T Y(n) Y T A 
-2B T d(n)X(n)+2A T d(n)Y(n)+d 2 (n) ] 



= E [d 2 (n) ] +B T R xx B+A T R yy A-2B T R xy A 



where R^ x = E[X(n)X~(n)] 




(3-36) 



^ = E[Y(n)Y T (n)] 
R dx = E [cd (n) X (n) ] 
R dy = E [d (n) Y (n) ] 



and R rY = E [ X ( n ) Y T (n) ] 
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The theory of Wiener filtering employs known second-order input 

statistics to dictate the impulse response of the linear filter 

that minimizes the mean square error; that is, as in the previous 

section, the knowledge of second order statistics R dx ' R xx 

assumed to calculate the optimum impulse response 

(optimum weight vector in nonrecursive adaptive filter) W n 

But here in the recursive algorithm, it is also required that 

the autocorrelation of the output, R , and the cross correlation 

of the output and the input, R , and the cross correlation of the 

xy 

output with the desired waveform, R^ , should be assumed known. 

Thus, the set of statistics mentioned above is assumed to be known 
for a moment, and will be used to determine the weights in 
the recursive filter. The statistics for the fixed para- 

meter filter are not a function of these statistics, but instead 

the weights are a function of these statistics. Therefore, R , 

xy 

R. and R are to be considered constant when the differentiation 
dx yy 

is made with respect to A and B. 

The set of weights (filter coefficient vectors) which minimize 
the mean square error can be found by getting the gradient vector 
with respect to filter parameters equal to zero. 



SE (e 2 (n) ] 
8 A 

A 



2R A - 2 R B + 2R 




yy 



xy 



dy 



0 



(3-37) 



and 



8 E [e 2 (n) ] 



^ g [E (e 2 (n) ) ] 



3 B 



= 2R B - 2R 



2R A 
xy 



2 R 



xx 



dx 



B 



^xx 1 ^ R dx + ^xy ^ 



(3-38) 



Thus, one can solve for the filter coefficients if all the 
second order statistics are known. But without knowing the 
impulse response of the filter, the R^, and R^can not be 

calculated with only input and reference signal statistics. 

Noting that we are looking for the impulse response which 
minimizes the mean square error in some way, it is clear that 
R xy' R dy' R yy 511:6 nQt ava; *- lable and so the Wiener approach is 
not feasible. 

C. LMS ALGORITHM . 

An iterative gradient search technique (the method of steepest 
descent) is revisited. Here, in the recursive algorithm, it 
updates the filter coefficients with steps proportional to the 
gradient vector. This updating process is 
A(n+1) = A(n) + k AA 

cl 

= A (n) + k a V A [E(e 2 (n) ) ] (3-39) 

B (n+1) = B (n) + k fa AB 

= B(n) + k fa V B [E (e 2 (n) ] 

where 

A(n), B(n) = filter coefficient vectors before adaptation 
A (n+1) ; B(n+1) = filter coefficient vectors after adaptation 
k a , k b = scalar constants controlling rate of convergence 
and stability (^ a , ^j 3 < °) 

V A [E (e 2 (n) ) ] , V B [E(e 2 (n))] = gradient vectors with respect 

to A and B respectively. 
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The updating process (3-39) can be considered as afirst order fil- 
tering process with an input proportional to thegradient vector. 

2 2 

But the gradients V A . [E (e (n) ) ] and Vg[E(e (n) ) ] should be esti- 

mated because the output statistics are not available apriori or 
an infinite statistical sample would be required to estimate 
perfectly the elements of the correlation matrices in Equations 
(3-37) and (3-38) . A method of estimating these gradients will 



be presented. 

Widrow [1] obtained the estimated gradient of the mean square 

error function by taking the gradient of a single time sample of 

the squared error (instantaneous estimates) when he discussed 

the nonrecursive adaptive filter (see previous section) . 

Here, in this thesis work, a new method of estimating 

gradients is proposed. This is to approximate the mean squared 
2 

error E (e (n) ] by an average of a finite number of points at every iteration 

and take the gradient of this instead of taking the instantaneous 

error square, that is, the approximation used is 

, 1 L-l 0 

E ( e^ (n) )~ -=- I e (n-£) (3-40) 

L i = 0 

9 

For E(e (n) ) , the average of the square error for the previous L points 

is taken and then gradient is evaluated for the approximate mean 
square error. The estimated gradient of mean square error is 



V,E[e 2 (n)] =V,[ Ve 2 (n-£ ) ] 
A A £ = 0 



V B e [e 



(n)] =V B [ 



L : 1 e 2 (n-i)] 
1 = 0 



(3-41) 
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For convenience, the^l ) term in (3-40) 

Lt 

vector notation for error signal 



e (n) 



e (n) 
e(n-l) 



was dropped. 



Introducing the 



e (n-L+1) 

then it is seen that the error signal vector is an (Lxl) vector. The 
estimated gradient (Equation (3-41) ) can be put into the matrix form 

✓s 

V A E[e 2 (n)] = V A [ e T (n) e(n) ] 

V B E[e 2 (n)] = V B [ ? T (n) e(n)] 



Substituting the estimated gradients in Equation (3-39) , the updating 
process for the .filter coefficients is: 

A (n+1) = A(n) + K a V A [e T (n)e(n) ] 

and 

B (n+1) =B'(n) + K b V B [7 T (n)r(n) ] (3-42) 

The function e* 1 (n) e’(n) is a scalar function of the coefficient 
vectors A and IT, that is, . 



e T (n)e (n) = f (A, B) 

Therefore , by definition, the gradient of f (A, B) with respect 
to A and B is 
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> 

3f 

3a^ 






\ 

3f 

TE 

o 


3f 


3f 


3f 




3f 


3A 


9a 2 


3B 




db 1 

• 




3f 

9a N 

s. J 






3f 

3 b w 
M J 



It follows that 



[E (e 2 (n) ■).'] = V a [F T 


(n)i!n) ] 








r \ 




/ 


\ 




3 (e’ T (n) e (n) ) 
3 a. ^ 




2F(S) - 


3 F(n) 
a l 




3 (F*( n)F(n) ) 




T 


3F (n) 




3a 2 

• 




2e (n) 


3a 2 




• 

3(F T (n)F(n) ) 




# 

_ T 

Op ( n ^ 


3e (n) 




3a N J 




^ c. \ ii ; 

V, 


9a N J 



(3-43) 



and 

A 



V B (E(e 2 (n) )) = Vg {e (n). T e (n) ) 



/ ^ 






a 


3 (F(n) T F (n) 




2 e (nT 


3F(n) 


3 b 




3b^ 


o 

3 (F(n) T F(n) 




2 F(n) T 


o 

3F(n) 


db ± 




3b ] _ 


3 (F(n) t F ( n) 

3 b. 

N J 




2F(n) T 


3e (n) 

9b N J 



(3-44) 
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Consider the terms 3e(n), 9e(n) in equation (3-43) , ( 

3a^ 3b„ 

P q 

where p=l,2,...N and q = 0,1,2, ...M. 



Since e (n) and e(n) 


are defined 


e (n) 


- y (n) - d (n) 




T 




e (n) 


= (e (n) , e (n-1) , . . . 




> 


r 


3F(n) = 


9e (n) 




3a 


Sa 




p 


p 






3e (n-1) 


— 




3a 






P 

• 






• 

3e (n-L+1) 






3 a 






p ) 


\ 


and 


r 3e (n) 


( 




T5q 






9e(n+l) 


= 


9£(n) _ 


3 bq 




3 b q 


• 

• 






• 

3 e ( n-L+1 ) 






3 bq J 





3y (n) 
9a 

P 

9y (n-1) 
3a„ 



9y (n-L+1) 



9a 



p=l, 2, . . .N 




q=0 ,1,2, . . .M 



Note that3e(n) and 3e(n) are (Lxl) vectors 



3ap 



3bq 



— T- 

and e(n) is an (lxL) vector. 



- 44 ) , 



(3-45) 



(3-46) 
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Therefore, V^[E(e 2 (n))] and V B [E(e 2 (n) )] in Equation (3-41) 
ate (Nxl) and [(M+l)xl] vectors respectively. 

Equations (3-4.5) and (3-46) may be considered as sensitivity 
vectors which tell how much the change in dp and bq affect 

the outputs y(n), y(n-l) . . . y(n-L+l) . 

To calculate the elementsof the estimated gradients of mean square 

error in equation (3-41) , we should calculate the sensitivity 

vector of equation (3-45) and (3-46) first. 

From the recursive equation (3-33) , 

3y(n) a r M N . 

= l b.X(n-i) - E A.y(n-i) 4 

8 dp 3 dp ^ i=o i=l * 



= - y(n-p) 



N ~ . 

Z A. 9y(n ~ l) 
i=l 1 3 dp 



(3-47) 



3y (n) 

3bg 



x (n-q) 



p = 1, 2, . 

v 3y (m-i) 
j=l i" 3bq 

q= 0, 1, 2, ... M 



N 



(3-48) 



The sensitivity vector components • given by Equations (3-47) and 
(3-48) can be interpreted as being the response of a linear system 
with transfer function. 



H(Z) 



l l*" 



+ a. 




a N Z 



-N 



(3-49) 



Henceforth /this will be called the "sensitivity filter." Equation 
(3-49) is ai all pole filter (recursive filter) with input signals 
[-y(n-p)] and [x(n-q)] respectively. 
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Now, what are the initial conditions characterizing the re- 
cursive relationships of the sensitivity filter? 

x(n-q) is the q time units delayed signal of input x(n) to the 
adaptive recursive filter of Equation (3-33) and y(n-p) is the p 
time units delayed signal of -the output y(n) of the recursive 
filter . 

From x(n) = o 

y(n) = o for n< o , 

x(n-q) and y(n-p) are sequences with the first q elements and first p 
elsrents zero re^pectively.And since changes in the ap and bq coeffici- 
ents have no effects on the system's response until n=p and n=q 
respectively, it follows that the initial conditions are: 

= o for n=o, 1 ... p-1 

3 ^ = o for n=o, 1 ... q-1 

A summary of this algorithm is 
the following : 

1. Calculate the sensitivity vector components - through the 
sensitivity recursive filter by equations (3-47) and (3-48). 

2. Calculate the estimated gradient by equation (3-43) and (3-44). 

3. Calculate the filter coefficient vector by equation (3-42). 

4. Calculate the filter output by equation (3-34) 

5. Form the e(n) vector e (n) , then go back to the 1st step. 

Note that due to the fact that the gradient of finite point square 
error average is used for the estimation of the true gradient of 
mean square, this filter cannot give an optimal solution, but the 
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more averaging points are used, the better performance is expected. 
It should be noted that if L=l, that is. 



A 

V A [E(e 2 (n))] = V A [e 2 (n)] 

V B [E(e 2 (n))] = V B [e 2 (n)] , 

this corresponds tousing the instantaneous error square for 
estimatingthe gradient, and this filter reduce s to the adaptive 
recursive filter proposed by White [3 ]. If the further approxima- 
tion is made that the sensitivity components of equation 



(3-47) , (3-48) are 

- *<"-<> , 

then the estimated gradient is 



and 



V A [E(e (n) )] = - 2 e(n) 



y(n-l) 
y (n-2) 



y (n-N) 



V B [E (e (n) ) ] = 



2 e (n) 



x (n) 
x(n-l) 



x (n-N) 



= -2e(n)Y 



(3-50) 



= 2e (n) X 



(3-51) 
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Then the filter coefficient updating process is 



A (n+1) = A (n) -2k e(n)Y 

cl 




B(n+1) = B(n)+2k b en)X 


(3-52) 


where k & , kj_< 0 and filter output 




y (n) = B(n) T X-A T (n)¥ 


(3-53) 



Equations (3-53) , (3-52) , (3-51) , (3-50) are exactly the same as 

the algorithm proposed by Feintuck [2] . This Feintuck algorithm 
has an advantage in simplicity when compared with the algorithms 
proposed by White [3] and proposed here which require additional 
recursive filters to generate the estimates of the gradient. 
Thus, it may be useful to extend the Feintuck algorithm to the 
two-dimensional recursive adaptive filter for simplicity. In 
the next section, the algorithms proposed by White and Feintuck 
are extended to the two-dimensional algorithm. 

3 . Two-dimensional Recursive Adaptive Filter 

In this section, a mathematical model of the adaptive 
recursive filter for the processing of two-dimensional signals 
is proposed. This can be considered as an extension of Fein- 
tuck's algorithm to two-dimensional filters. 

Two transversal filters having the same structure as the 
linear combinatorial system used in the non-recursive two- 
dimensional processor, are used in the recursive processor, 
one for the feedforward path and one for the feedback path. 

The two-dimensional recursive filter is described by 
its transfer function. 
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Y (2 1' 2 2 } L Z 0 j l Q b ij 2 l lz 2 3 



X ( Z ^ , 2 2 ) 



„ „ -n 

I n a mn z l z 2 
m=o n=0 



1 + ¥ N 
m=c 
(m,n)^(0,0) 

In the spatial domain, the input-output relation cfthedigital fil- 
ter is given by 

p q M N 

y(k,£) = E E b^x (k-i, Z-j) - E E a mn y (k-m, Z-n) 



i=0 j=0 



ID 



n=0 n=0 

(3-54) 



The following notation is introduced 



B T = [b b n1 ....b b n A b, . . . . .b.. b-., 

oo 01 oq 10 11 lq 20 

— T 



.b ] 

pq 



X = [x (k, Z) ,x (k, £-1) , . .x (k, &-q) x (k-1 , Z) x (k-1, £-1) . . .x (k-1, 5,-q) 
x(k-2, Z x(k-p, £-q) 3 



-= T 



and A a 02 . . .^ N a 1( ja 11 a 1N a 2 o 



,a MN^ 



Y T =[;y (k,£-l) , y (k,£ -2) y(k,£-N) y (k-1, A ) y (k-1, 2,-1) . . .y (k-l,£-N) 



y(k-2, Z ) y(k-M,£-N) 3 

The filter coefficient vectors "A - and "B - are [ (M+l) (N+l) -1] xl 

and (p+1) (q+1) xl, respectively, and the input-output signal vectors 

are again (p+1) (q+l)xl and [(M+l) (N+l)-l]xl, respectively. 

Then equation (22) can be written as 

y(k,Z) = B T X-A T Y (3-55) 

Here, to obtain an estimate of gradients of the mean square error 
function, a single sample of the square error is taken. That is: 

A 

V [E (e 2 (k,£ ) ) ] =V [e 2 (k,£) ]=2e(k,£)V [e(k,M ] , and again the 
adaptation scheme (filter coefficient updating process) is used 
in the same fashion as in the nonrecursive case [see Figure 3+5] . 
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Denoting the error at jth iteration as e(k,£) then 

A ( j + 1) = A ( j ) + 2k a e (j) V A (e (j) ] 

B (j+1) = B(j) + 2k b e(j)V A (e(j)] (3-56) 



where 



e (k, A) = e ( j ) = y (k, A) 
The components afthevec tor sV^ 
calculated as following. 
From Equation (3-57) 

V A [e(k,£)] =V A [y(k,£)] 



3y (k,£) 
Sa 01 



- d(k,£) (3-57) 

[e(k,£)] andV R [e (k,£^ ] can be 



. 3y (k,£) 3y (k,£) 
3a 0M 3a 10 

3y (k,£) 

3a MN 



. « 3y (k,£) 
3a lM 



3 y (k,£) 
' 3a 20 



(3-58) 



and V B [e (k, £ ) ] =V R [y (k,£) ] 



B 



3y(k,l) 

TB 

oo 



3y (k,£) 

3 B 



oq 



3y (k, £) 

3b io 



3y (k,£) 

dK 



iq 



3y (k, £) 
3b 20 



3y (k,£) 



3 b 



pq 



(3-59) 



Note thatV. [e (k, £) ] and V_ [ e ( k , £ ) ] bavetbe same dimensions as and 
A B 

B, that is: 

[ (M+l) (N+l) -1] x 1 and [ (p+1) (q+1) ]/ respectively . 

From the recursive relation of Equation (3-54) 



3y (k,£) 



3a 



rs 



and 3y(k,£) 

TE 

uv 



-y (k-r , £-s) 
x (k-u,y-v) 



M 

E 



N 

Z a 



m=o n=o 



mn 



^y (k-m, jl-n) 



3 a 



rs 



^ ^ a 3 y(k-m,£-n) 

m=o n=o mn 3 b 
(m, n) ^ (0 , 0) 



(3-60) 

(3-61) 
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The recursive relationship of Equations (3-60) and (3-61) should 
be noted again, which can be implemented by additional recur- 
sive filters. 

Forming the instantaneous error gradient of Equations 
(3-54) , (3-55) using the output of additional recursive filters 
of Equations (3-60) (3-61) , the filter coefficient adaptation 

process of Equations (3-56) (3-57) can be performed. Note that 

this algorithm corresponds to the two-dimensional version of the 
algorithm proposed by White [3] . 

If we make the approximation 



IXliiiiL. = - y{k-r,n-s) 

8 a, 



rs 



. = x (k-u, y-v) , 



uv 



then it follows that 



V A [e(k,£)] = 



y (k,£-l) 
y (k,H-2) 



Y 



y(k-l,£) 
y (k-1 ,£-l) 



y (k-2 , H ) 



y ( k-M , £ -N ) J 
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and 






1 



V B [e(k,£)] 



x (k, £) 
x (k, £,-1 ) 



A _ 

= X 



x (k , &-q) 
x (k-1 , Z) 



v x(k-p,£-q) j 



Therefore, in this case, the complete algorithm is described by 



A 

e{k y l) = e ( j ) = y (k,&) -d (k,£) 

A ( j+1) = A(j)-2k e(j) Y 

Ct 

B ( j+1) = B(j) +2k fc e(j) X 
and Y(k,£) = B T X - AY 

This is the two-dimensional version of the algorithm proposed 
by Feintuck [2] . 
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IV. ADAPTIVE NOISE CANCELLER 



A. THE CONCEPT OF ADAPTIVE NOISE CANCELLING 

Noise cancelling is a variation of optimal filtering that 
is highly advantageous in many applications. Specially in 
Wiener filtering or Kalman filtering, which are optimal, 
apriori knowledge of both signal and noise statistics are 
required. Adaptive filters, on the other hand, have the 
ability to adjust their own parameters automatically, and their 
design requires little or no apriori knowledge of signal and 
noise statistics while the Wiener approach utilizes a fixed 
parameter filter based on known statistics. 

Figure (4-1) shows the basic problem and the adaptive 
noise cancelling solution to it. It makes use of a reference 
input derived from one (or more) sensors located at the points 
in the noise field where the signal is weak or undetectable. 

This input is filtered and substracted from a primary input con- 
taining both signal and noise. As a result the primary noise 
is attenuated or eliminated by cancellation. 

At first glance, subtracting noise from a signal seems to 
be a dangerous procedure. If done improperly it could result 
in an increase in output noise power. If, however, filtering 
and subtraction are controlled by an appropriate adaptive 
process, noise reduction can be accomplished with little risk 
of distorting the signal or increasing the output noise level. 
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SYSTEM 

OUTPUT 




FIGURE 4-Z NOISE CANCELLING WITHOUT AN EXTERNAL 
REFERENCE SOURCE 
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The following argument for the above is mainly due to 

Widrow, et al [4] . In Figure (4-1) , a signal S is transmitted 

over a channel to a sensor that also receives a noise N. 

o 

A second sensor receives a noise 
uncorrelated with the signal but correlated in some un- 
known way with the noise N q . In addition to these noises, 

additive random noises M and M n uncorrelated with each other 

o 1 

and with S, N q and N^are present. Then the reference input 
is 

d = s+ N Q + M Q (4-1) 

and the primary input 

x = N ] _+ M ] _ (4-2) 

The noise N^+ is filtered to produce an output y that is as 

close a replica as possible of N q + M q . This output is 

subtracted from the reference input S + N q + to produce the 

system output 

z = S + N +M -y 
o o 

In other words, the practical objective of the noise cancelling 
system is to produce a system output z = S + N Q + M - y that is 
best fit in the least square sense to the signal S. This ob- 
jective is accomplished by feeding the system output back to the 
adaptive filter and adjusting the filter through the LMS 
adaptive algorithm (described in previous chapter) to minimize 
total system output power. Note that the system output serves 
as the error signal for the adaptive process. 
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Assume for the moment that the noises M and M. do not exist, 
then if one knew the characteristics of the channels over 
which the noise N q is transmitted to the reference input, it 
would be possible theoretically to design a fixed filter capable 
of changing into N q . That is, if the correct model of this 
transmission channel, H(z), is obtained, the adaptive filter 
would be simply 1 , a fixed filter. 

HTzT 

Assume that S, N Q , N^, M q , M^, and y are statistically 

stationary and have zero means. Assume that S is uncorrelated 

with N q and N^ard that Moand are uncorrelated with each other and 

with S, N and N. , and suppose that N. is correlated with N . 

o i 1 o 

The output z is 

z = S + N + M -y (4-3) 

o o 

squaring, one obtains 

z 2 = S 2 + (N q + M q - y) 2 + 2S (N q + M q - y) (4-4) 

Taking expectations of both sides and realizing that S is un- 
correlated with N > M and y, yields 

o o 

E[z 2 ] = E [S 2 ] + E[(N o + M q - y) 2 ] +2E [S ( N q + M o - y) ] 

= E[S 2 ) + E[(N o + M q - y ft (4-5) 

2 

The signal power E[S ] will be unaffected as the filter is ad- 
. . . 2 

justed to minimize E[z ]. Accordingly, the minimum output power 
is 

min E[z 2 ] = E[S 2 ] + min E [ (N Q + M Q - y ) 2 ] (4-6) 

, 2 
Since the filter is adjusted so that E(z ) is minimized, therefore 

2 

E[(N q + M Q - y) ] is minimized. The filter output y is then a 
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best least squares estimate of the noise N q + M . Moreover, 

2 2 

when E [ (N q + M q -y) ] is minimized, E[(z-S) ] is also mini- 

mized. Since, from (4-3) 

z-S = M q + N q - y (4-7) 

Adjusting or adapting the filter to minimize the total output 
power is thus equivalent to causing the output z to be a best 
least square estimate of the signal S for a given structure and 
adjustability of the adaptive filter and for the given reference 
input. The output z will contain the signal S plus noise. 

From (4-3) , the output noise is given by (N Q + M Q - y) . Since 

2 2 
minimizing the E[z ] minimizes the E [N + M - y) ] , minimizing 

o o 

the total output power minimizes the output noise power. Since 
the signal in the output remains constant, minimizing the total 
output power maximizes the output signal to noise ratio. Note 
that if E [ (N q - y) 2] = 0 can be achieved , then E[z 2 ] = E(S 2 ), 
therefore y = N Q + M Q and z = S. In this case, minimizing output 
power causes the output signal to be perfectly noise free. Also 
note that, on* the other hand, when the reference input is com- 
pletely uncorrelated with the primary input, the filter will "turn 
itself off" and will not increase output noise. 

In this case, the filter output y will be uncorrelated with 
the primary input. The output power will be 

E[z 2 ] - E[(S+ M q + N q ) 2 ] - 2E[y(s+N Q + M )] + E[y 2 ] 

= E[(S + M q + M q ) 2 ] + E [y 2 ] (4-7) 
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2 

Therefore, minimizing output power requires that E(y ) 

be minimized, which is accomplished by making all weights 

2 

zero, bringing E [y ] to zero. 

It should be noted that in applying adaptive techniques 
to a practical systems problem, the key step lies in providing an 
appropriate desired response signal for the adaptation process, 
that is, the reference input should be provided through the ap- 
propriate scheme, while the exact knowledge of statistical 
characteristics are not required. In adaptive modeling applica- 
tions, the desired response is generally available as the output 
of the unknown system to be modeled. And also in the noise 
cancelling scheme above, the reference input is available by 
sensing noise which is correlated with the noise at the primary 

input in some manner. 

In next section, the signal filtering problem is discussed 
when no external reference input free of signal is available. 

B. NOISE CANCELLING WITHOUT AN EXTERNAL REFERENCE INPUT 

This section is concerned with signal filtering 

(estimation) a noise-corrupted signal when no external 
reference input is available. Here, it is assumed that only the 
noise -corrupted signal is available, that is, referring to the 
Figure (4-1) of the previous section, the noise free of signal N^ 
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which is correlated with N q that corrupted the signal S is not 
available; only S + N Q is available. 

It is proposed to estimate the signal S by cancelling the 
noise N Q in some adaptive way. In the following, it is shown 
how a reference input can be obtained for the adaptation process 
under certain conditions. Assume that the noise corrupted 
signal x = S + N is composed of broad band noise N and a narrow 
band signal S, then the autocorrelation function of the signal 
is broad and that of the noise is narrow. Also assume that noise 
N is uncorrelated with the signal and that the mean values of 
both signal and noise are zero. 

Consider a signal delayed by 6 units, 
x ( j-6 ) = S (j-6) + n (j - 6) (4-9) 

where 6 is a sufficient number of time units so that the noise 
component is decorrelated , but the signal component still 
remains correlated. 

Then 

E[n(j) n (j-6) ] = 0 

E(S(j) S (j-6) ] ? 0, finite (4-10) 

For the two-dimensional signal, a signal delayed by 6-^, 6 2 
units in the horizontal and vertical direction respectively, 
where 5^ and 6 2 are sufficient length of spatial units such that 
the noise field would be decorrelated but the signal field still 
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remains correlated, then 

E [N (k, ft) N Ck-6 1 ,ft-6 2 )] = 0 

E [S (k, ft) S (k-6 1 ,ft-6 2 )] j* 0, finite (4-11) 

and again it is assumed the signal field and noise field are not 
correlated with each other. 

Now if this delayed signal is used as a primary input and 
the original input used as a reference input to the adaptive filter, 
then referring to Figure (4-1) of previous section, 

S(j-<5) or S(k-6^,ft -6 2 ) can be considered as N^in Figure (4-1) 
and N(j-6) or n(k-6^,ft - 6 2 ) as M^, and S(j) or S(k,ft) can be 
considered as N c and N(j) or N(k,ft) as M Q in Figure (4-2), re- • 
spectively . 

From equation (4-10) and the assumptions that the signal and noise 
are uncorrelated, it is seen that the assumptions made in the 
last section for the various signals holds here. 

Therefore, from the argument in section IV-A., , the 

filter output would be a good estimate of the signal S . Figure 
4-2 shows the noise cancelling (or signal estimation) scheme 
riscussed above. 
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V. EXPERIMENT AND RESULTS 



In this chapter, a computer experiment is performed to 
check the feasibility of the algorithms derived in Chapter 
21 for certain applications. The signal estimation problem 
for a noise corrupted signal is treated here for both one- 
dimensional and two-dimensional cases. Nonrecursive adaptive 
filtering and recursive filtering have been examined and the perfor- 
mances of adaptive filters are compared to that of the optimal 
Wiener solution. The adaptive noise cancelling scheme is 
used for this application. 

First, consider a band limited one-dimensional signal S 
corrupted by noise N; it is desired to estimate the signal. 

If the statistics of both signal and noise are known apriori, 
a fixed optimal filter to estimate the signal can be designed 
by the Wiener Hopf solution of equation (3-11) . 

Here it is assumed that these statistics are not known 
apriori but only that the signal is narrowband and 

the noise is a broad band signal and the signal is entirely 
uncorrelated with the noise. Then the signal has a wide cor- 
relation function while the noise has a narrow correlation 
function. Separation of this broadband noise and narrowband 
signal is now required for the estimation of the signal. 

It is assumed further that the desired (or reference) signal 
which is needed for the adaptive process is not available, that 
is, no other possible reference signal is’ available which may have seme 
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correlation with the signal we want to estimate . 



This problem can be considered as an adaptive noise 
cancelling problem without reference input. Assuming that the 
noise is white, then from the Figure (4-2) one unit delay is 
enough to decorrelate the noise component appearing in the adap- 
tive filter input frcm the noise omiponent, in the desired signal. These 
components will thus appear in the error but not in the filter 
output. The narrowband component, on the other hand, will not 
be decorrelated by the delay and will appear in the adaptive 
filter output. 

The input signal would be 
x( j) = S ( j) + N( j) 
where S(j) bandlimited signal 
N ( j ) white noise 

and the reference input would be 
d ( j) = S ( j —1 ) + N (j-1) . 

The form of autocorrelation function of the signal is assumed as 
R (m) = p cos w 0 m 

For the purpose of computer simulation, the following values 
are assigned: 
p = 0.95 
w 0 = 0.025 

and the variance of noise is 0.5. 
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For the optimum filter design using the abovevalues ,a transversal 
filter having 10 delays was used. From equation (3-11) . 



WlMS - R X >f 1 R xd 



The autocorrelation matrix R was computed as 
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and the crosscorrelation matrix R - as 

xd 



R, 



xd 

c 



1.00000 0.9497o Q. 90137 C. 65496 0.81044 0.T6774 0.72604 0.6 8767 0.65020 0.6 1436 



Then the optimum Wiener Hopf solution gives the filter co- 
efficients as 



W = Wiener Weight-Vector ~ 



/ \ 
0.332^7 
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0. 1347 5 
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0. C0370 

0.00567 
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For simulation of the bandlimited signal, the state and out- 
put equations of example 1) in Section 2 of Chapter II are 
used . 

For the nonrecursive adaptive filter application, again 
10 delays and = -0.005 as a step size in LMS algorithm were 
used, and 2 delays for both feedforward and feedback path and 
= -0.001 for k a , k b (equation (3-42) ), were used in the recur- 
sive filter application of both Feintuck's algorithm and the 
algorithm developed here. Eight points were used for error 
square averaging for the gradient estimation (L=8, in Equation 
(3-40) ) . The experimental results are plotted in the following 

along with the descriptions and optimal solution for the purpose 
of comparison. The results indicate the adaptive recursive 
filter appears to perform as well as the optimal Wiener filter 
once it reaches a steady state condition. 
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FIGURE R-l SIGNAL STATISTICS 
Autocorrelation =0.96 m cos (0.025 m) 
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FIGURE R-2 
NOISE : 



NOISE CORRUPTED SIGNAL 
WHITE: Zero Mean 



: Variance = 0.5 
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FIGURE R-3 WIENER-HOPF FILTERING 



10 Delays are used. 
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FIGURE R-4 WIDROW'S NONRECURSIVE ADAPTIVE 
FILTERING 

Number of delays used: 10 

Stepsize Used: -0.005 
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FIGURE R-5 FILTERING BY FEINTUCK'S ALGORITHM 

NUMBER OF DELAYS IN FEED FORWARD PATH: 2 

IN FEED BACK PATH : 2 



STEPSIZE USED: -0.001 
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FIGURE R-6 FILTERING BY THE ALGORITHM USING A 

FINITE POINT MOVING SQUARE ERROR AVERAGE 
For the estimation of gradient 
Number of Delays in Feed Forward Path: 2 

in Feed Back Path : 2 

Stepsize Used: -0.001 
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As a second application, consider an image sensed by an image 
sensing device of an NxN sensing elements array. It is assumed 
that this imageis composed of correlated background and a 
three diagonal line target trajectory. This image may 

be interfered withby the internal noise of device (assumed white). 
Then the output image includes three types of processes: 
x(k,£) = S(k,£) + T(k,£) + w(k,£) 

where 

S(k,&) = correlated background 

T(k,£) = Target strength (three diagonal line) 

W(k,&) = noise. 

Again it is assumed that no statistics are known apriori 
and the correlated background is a narrowband signal. It is 
further assumed thatthecorrelated background and noise are 
uncorrelated with each other. It is proposed to separate 
the three diagonal lines from the background noise. Again, the 
same argument holds that this problem is a two-dimensional noise- 
cancelling problem in which no reference is available. It is 
further assumed that the correlated background is a band pass 
process for which the autocorrelation function is 



R ss ( m ' n ) = 



m 



n 



cos w^n cos w y n 



r K v 

where P v represent horizontal and vertical direction cor- 
relation coefficients respectively. 
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Going back to Figure (4-2) in the previous section, the delay 
z.y z^ 1 will be sufficient to decorrelate the white noise 
and diagonal line target. Then as a result of filtering, the 
system output (or the residual field) will be the desired sig- 
nal. It should be noted that this residual signal is com- 
posed of an estimate of three diagonal lines and white 
noise as well as acme granularity due to the fixed stepsize[l]. 

The problem of enhancing the target diagonal line which 
is subjected to the noise (white noise and adaptation noise) 
is another problem of interest. It will not be considered 
in this ivork. 

For the purpose of simulation, the following values 
were used: 

1) p = p, = 0.96 w,= w =0.143 

J v K h h v 

2) p = p, = 0.99 w. = w =0.143 

J v K h h v 

The variance of correlated background = 1.0 
White noise variance = 0.1 

Target diagonal line intensity =1.8 

For the optimal filter design, using above values for p^= 
p-^ = 0.96, the Wiener-Hopf solution of Equation (3-30)is: 



W 



LMS 



= R 



-1 



xx 



R 



xd 
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where R xx , R xd are defined by equations (3-22), (3 28), 

(3-29) s respectively. 

Using p=3, q = 3 in equation (3-20), 
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Only the recursive simulation is performed and compared with 
the nonrecursive Wiener filter. The simulation results are 
shown on the following pages and indicate that although the 
optimal performance of the Wiener filter is not achieved, 
the adaptive recursive filter performs well. 
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FIGURE R-7 CORRELATED BACKGROUND + THREE STREAKS + 

WHITE NOISE 

Background: R(m,n) = 0 . 9 0 . 96^ cos(0.143 m) cos ( 0. 143n) 

Variance =1.0 
White noise: zero mean 

variance = 0.1 
Streaks intensity: 1.8 
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FIGURE R-8 



RESIDUAL SIGNAL AFTER WIENER filtering Figure R-7 
White Noise + Streaks 
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FIGURE R 
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■9 RESIDUAL SIGNAL AFTER RECURSIVE FILTERING 
White Noise + Streaks + Adaptation Noise 
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FIGURE R-10 Backgrounds + White noise + Streaks 



Background: R(m,n) = 0.99 m 0.99 
Variance = 1.0 
White noise: zero mean 

Variance = 0.1 
Streaks Intensity: 1.8 



n 



cos ( 0 . 143m) cos ( 0 . 143n) 
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FIGURE R— 11 



White 



RESIDUAL SIGNAL AFTER WIENER FILTERING 
Figure R-10 
Noise + Streaks 
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FIGURE 




*-12 RESIDUAL SIGNAL AFTER ADAPTIVE RECURSIVE FILTERING 
White Noise + Streaks + Adaptation Noise 
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VI. CONCLUSIONS 



The study described herein has developed a new algorithm 
for the one-dimensional signal filtering problem and extends 
this to two-dimensional processing. It is an adaptive recur- 
sive filtering algorithm based on the steepest descent gradient 
method which employs the finite point square error for the 
gradient estimation rather than instantaneous square error. 

A simplified two-dimensional version of this algorithm 
is developed. It is designed to estimate the signal in real- 
time operation in cases where the statistics of both signal 
and corrupting noise are not available apriori. The algorithm 
learns the statistics and adapts even though it is not optimal, 
which means that it seeks the minimum of the error criterion. 

It should be noted that Widrow's nonrecursive adaptive filtering 
algorithm gives the global minimum of performance criterion due 
to the fact that for the stationary process, the mean square 
error is the quadratic form of weight vectors, but for the re- 
cursive adaptive filter, local minima may be found instead of the 
global minimum. The computer simulation shows that for the examples 
considered here the algorithms presented learn the 
statistics of signal and adapt. Several points can be observed 
through the experimental results [see Figure R-l through R-12] . 

1) All the algorithms presented here give a satisfactory 

result after the transients die out even though they are 
not optimal. 
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2) The algorithm which employs the finite point average 

square error for the gradient estimates gives more 
rapid convergence than Feintuck's algorithm. The 

possible reasons may be due to the fact that the output 
information is fed back and used for the filter coef- 
ficients updating process and the sensitivity information 
propagates through the recursive equation as the iteration 
proceeds, while Feintuck's algorithm discards the sensi- 
tivity information. 

3) The algorithm developed here gives the best results among 

the various algorithms presented at the expense of 
complex hardware. Note that the required number of addi- 
tional sensitivity filters (equations (3-47), (3-48)) 

would be the number of filter coefficients, and due to the 
L point averaging process, additional storage elements are 
also needed. The possible reason for the good results 
may be due to the fact that the averaging process [equation 
(3-40)] for the gradient estimate gives a smaller 

error between true gradient and estimated 
gradients than the gradient estimate based on instantaneous 
square error does, while both give unbiased gradient 

estimates . 

Due to the emerging interest in adaptive recursive 

filters, further research on this subject may be worthwhile. 
The following are left open for further research; 
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1 ) 



Comparison of steady state performance of the recur- 
sive adaptive filter with the Kalman filtering tech- 
nique. It would lead to a better understanding of the 
performance of the recursive filter to express the 
filter coefficients } in terms of steady-state Kalman 
filter gains. 

2) Mathematical derivation of the bound in step size of 
the filter coefficient updating process for convergence 
and stability. It is believed that this bound may be at- 
tained by setting up the constraints first such that the 
value of performance criterion decreases mono- 
tomically to a minimum as the iteration progresses. 

3) Modification of the algorithms for the case that . partial 
statistics of signal or noise are available apriori. 

4) Derivation of the algorithm based on a different per- 
formance criterion such as maximum likelihood ratio, 
maximum signal to noise ratio, etc. 

5) Derivation of the algorithm based on the different mini- 
mization techniques such as Newton's method or Fletcher- 
Powell methods, etc., for a given performance criterion. 
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